All right.
How is everybody doing today?
It's great to be here.
Thank you.
Well, I developed this presentation, and I hope that you all find the various tools on
here interesting and most of all fun and useful, as useful as I have over the years.
So you have to try that there.
All right.
There we go.
Anybody know what that's a picture of on the screen?
Just raise your hands.
Just kind of curious.
One person.
All right.
And the audio did not work.
Try that one more time, otherwise.
That's what solitaire sounds like when you play it through a sound file.
We'll get back to that.
All right.
So what can us defenders do?
You know, sometimes you've been ‑‑ I'm sure all of you have had malware attack your
system before.
It does stuff, right?
It drops files.
It changes register keys, things like that.
You want to know what happened.
Another useful topic we'll talk about today is a file content type identification.
Just because a file has an extension, that doesn't mean that's what it is.
But we'll look a little deeper than just looking at the magic numbers and so forth.
A little bit of state analysis, a little bit of reversing XOR encryption.
And I don't know.
There's just lots of uses for these various tools.
There's some stego tools.
These are all on the CD.
I checked this morning.
And then there's a couple of the analysis tools you'll get to see.
Attackers have lots of tools, right?
Packers, base encoders, crypters, compressors, wrappers ‑‑ oh, I wrote that one.
That's just for fun.
And various stego tools.
There's a lot of stego tools out there.
Defenders, of course, hex editors and strings, you all know what that is.
Footprint, that's one of my tools I wrote that helps identify what malware did to your system.
It takes a snapshot of files, register keys, processes and services.
And then you can take a later snapshot and compare them.
It does that.
Write bitmap histogram, which is a terrible title, but I can't think of anything else
to call it.
Takes an image of a file.
And also takes some basic statistics.
You can learn a lot from just a few basic statistics.
And then the statistical analyzer kind of combines the two and automates them.
And I don't have much time to talk about that one.
But it's there, and if you want more information, my contact information is at the end of the
presentation.
A little bit about me, instead of counting sheep, I counted powers of two.
I learned how to program at 14.
I did have a couple of Atari games published.
You haven't heard of them.
It wasn't Pacman or anything.
I joined the Air Force, got my degrees, and now I'm an engineer at Harris and a part‑time
instructor at UTSA.
All right.
Wrappers is just a small utility.
Basically, it will take any file you want and wrap it up into a bitmap header or a WAV
file header of various types.
It's got a few options there.
And so it's very simple.
It's good for demos.
Here's a steg LSB tool.
This one hides in the least significant bit, which is common.
This one has five bits hidden.
So at first glance, if you didn't see the original, you might not notice.
But the original, of course, looks a little better.
But you think about that.
Five out of eight bits.
That's a lot of data.
All right.
So this one hides in JPEGs.
That's also on the CD.
That's my dog.
He texted me last night.
He's lonely.
You can't see anything, though.
You can't see any artifacts between the original and that one.
So you've got that tool on your CD as well.
We'll show you how you can detect that, though.
Malware effects, as I mentioned, does lots of different things to your system.
Also, sometimes you install these programs, trial programs, or maybe programs you don't
want anymore and you uninstall them.
Did the uninstall get rid of all the registry keys?
Did it get rid of all the files?
Footprint can help you find out.
So it takes a snapshot of the system, stores in a big log file.
It can save it by size as well.
And date, the date part isn't working yet.
I've got to get that one done.
But if anybody wants to upgrade a version with date, you can e‑mail me.
I'm going to go ahead and do that.
But that's good for finding files that were just recently installed.
Often malware will drop a bunch of files maybe in different places, but they all have the
same date.
So this can identify stuff that's just been dropped on the system.
Or if you like getting videos that they don't let you download or pictures that you click
on them and you can't save picture as, go to your content folder and use this tool and
it will find it.
Regular Windows browsing and searching doesn't search the content folder, but if you get
there and use this tool, it will find it.
You don't have to get there.
You can run it from the C‑Drive and it will find all those files and list them out.
So then you can find that video and just simply copy it from that folder to one you want to
use it for.
Footprint can compare the two different files.
Log files, here's all the files, what has changed.
Same thing with registry keys.
Same thing with processes and services.
This is just a sample output I'm going to go through real quick just to show you if
a file was deleted or file was added.
It's a very textual type program, but here's where it shows what it looks like when a file
is modified.
And it just creates this big log file or small.
There hasn't been many changes since the last footprint.
All right.
File type characteristics.
Malware often disguises itself, may pack stuff in executables, encrypt it.
This can help detect that.
The right bitmap histogram tool will do a few things.
It can create a bitmap image.
As you saw in the beginning, that was a bitmap image of Solitaire.
One person recognized that probably as an executable.
The chart on the right was a histogram.
That's a typical histogram for an executable.
And then before discussing the tools, got to do a little bit of math.
So once we get through the math, there's a lot of slides.
I meant to mention that.
There's a lot of slides in this presentation.
So once we get through the math and you understand a little bit about the tools as it goes on
in its uses.
If we don't get finished, it will be easy for you to figure out on your own.
All right.
Who has heard of entropy before and knows what it is?
I'm curious.
Okay.
It looks like about half or so.
All right.
Very good.
And what about a histogram?
Do you all know what a histogram is?
Okay.
About the same people.
All right.
Usually we consider, of course, bytes with computers, 0 to 255.
So the maximum entropy is the log base 2 of the total number of symbols.
So log base 2 of 255.
256 different symbols is 8.
So the maximum entropy for a file can be 8.
If that file is base 32 encoded, maximum entropy is going to be 5.
I don't have an example in the slides this time, but you can actually tell if the base
32 encoding has encoded an encrypted file or if it's encoded a text file just by using
this tool.
And, well, of course, for base 64, that's a little quiz, 2 to the sixth is 64.
So who got a gold star?
One person.
All right.
Very good.
Two.
All right.
A little bit of statistics here.
So P is the probability.
The log is often abbreviated LG to mean log base 2.
And that's simply 2 to what power equals X?
So log base 2 of 256 is 8.
Log base 2 of 4 is 2.
Log base 2 of 8 is 3.
And so on.
We can estimate the probability in a file by counting.
So you take a file, count how many zero bytes, count how many one bytes, count how many two
bytes and so on.
And that's the histogram.
That's the count, the frequency distribution of each byte is another way of putting it.
So giving that count and the total number of bytes, we can compute the probability for
each byte.
So we can say, you know, if zero appeared 25 times out of 100, we can say the probability
is 0.25.
And then we can plug into this nice nifty formula here, okay, which looks complicated,
but it's really just a for loop that's multiplying the probability times the log base 2 of the
probability and adding it all up.
You'll get a negative number out of that.
I'll skip the log derivations for today.
And you add it up and you get an entropy count, H. H is the entropy.
So encrypted files have the greatest entropy.
Compressed files are next.
Text and so on.
Every file type generally has some characteristic range of entropy.
24 bit maps I've found have been very varied, but executable files, text files, they're
kind of in a range.
Compressed and encrypted files are in a very narrow range.
So you can identify a lot just by the entropy.
So bottom line is the higher the entropy, the more uncertainty.
That's what you want in an encrypted file, right?
You don't want the above.
You don't want the opponent, the attacker, to figure out what you've encrypted.
You don't want them to have any kind of information about what symbols are.
Compressed removes pattern and once you remove pattern, you get a randomized looking file,
but it's not as random as an encrypted file.
English text I've found to be around 4.5, 4.6, 4.3.
It's in a very narrow range.
So you can identify that immediately.
Now of course it does depend on having sufficient data.
Okay.
Very small files, the entropy counts are going to be skewed.
I've found that in practice around 4K is where it starts to get reasonably accurate.
Of course the more you have, the more accurate it looks.
So histogram, I've kind of talked about that already.
Just on the chart, on the left side of the chart, that's going to be the zero count and
on the right side of the chart is the 255 count and the darker lines are at 16 value
intervals.
So at 16 the line is a little darker.
At 32 the line is a little darker just to kind of break it up a little bit.
Many file types I've discovered have unique histogram characteristics.
And so I've used that.
You can identify them very quickly in many cases.
All right.
So of course here's how you identify a file.
You've got this new file.
What is it?
Well, you look at the extension, but that doesn't mean anything, right?
You look at the magic number.
That may mean something if it's not disguised as something.
You can apply a visualization.
That's what this tool does.
It will also do the ID.
The audioization, which is kind of a very strange word, but it's actually out there.
And then statistics.
So here's what we check on the file.
You know, what's in it?
Does it match the extension?
Does it have unusual data?
Does it have hidden data?
Does it have appended data?
Is part of it compressed?
We can tell a lot.
All right.
That's just a command line for using the histogram tool.
All the tools have usage functions, so I'm sure you can figure them out.
And here's the text file.
On the left is what it looks like.
You can see it's very dark because text is all below 128.
So it's all the darker shades of gray.
On the right, you can see the histogram.
What character is this, do you think?
Space.
That's right.
Space is the most common character in text, followed by the E and the T.
These are lower case.
Uppercase is kind of hard to see in this one.
There wasn't very many uppercase.
You've got numbers.
You can notice the pairing, carriage return line feed.
Those are all the same size there.
So you can see that.
This is the text output of the program.
So that gives you, like, the exact numbers.
So you can see the exact counts.
Sometimes it's useful because, of course, the visual one is scaled, right?
So you can't necessarily see the difference between a few values on a large histogram.
Here's HTML.
So you see that has some textual characteristics.
But it also has a lot of pairings.
You know, HTML has all the tags with the braces and all so forth.
And so you can see that here.
See source code, you know, Java code and stuff shows up the same way.
You get lots of pairings.
So you can distinguish between text and C++ and that type of thing to a certain degree.
Here's a bitmap.
The one characteristic of a ‑‑ this is a bitmap of a bitmap, by the way.
It's not a ‑‑ you know, it just kind of gets out of synchronization there.
But you can see that it's smooth.
That's the characteristic of a network.
A natural bitmap.
All of them are smooth.
If they're not fairly smooth, then something's going on.
Now 8‑bit gray scale is very spiky, just like that, as well as an 8‑bit color bitmap.
We don't know where the spikes are, which values are the most common, but they all look
spiky.
And, of course, for some of you that know, a gray scale, 8‑bit gray scale and 8‑bit
color is the same in terms of the file content.
It's just the palette that's different.
Speech, all 8‑bit wave files that are natural wave files will look like this.
That's because waves oscillate about the central axis.
So you get the most values in the middle, and as you go out towards the edges, you get
fewer values.
Music is a little fuller than speech.
So you still get the central spike.
16‑bit speech, it's a little tough to notice at first, but you get ‑‑ where's my cursor ‑‑
you get a U shape.
It's kind of a ‑‑.
It's a very open U there, because there's very little in the upper extremities.
These are the upper extremities, and there's very few samples up there.
But when you get music like that, then you get a fuller U shape.
If it doesn't have a U shape, it's not 16‑bit audio.
You can take anything, just like I did with the Solitaire program, and wrap it up in a
wave file header, and you will not get this histogram.
It will not look like this.
But natural audio will all have a U‑type shape.
Or a pointed shape if it's 8‑bit.
JPEG.
This one has a lot of zeros.
You see it's pretty uniform over here.
See it's fairly flat over there.
So that's characteristic of JPEG.
Some of them are more spiky than others, but they all have a reasonably uniform distribution
across the top.
PE files typically have large numbers of zeros and large numbers of FF.
And then various values here.
Okay.
The thing about the PE file that's very characteristic.
Okay.
It has different sections.
So it looks like a text section, which is the actual code, and then various sections
in here of different data types.
They all have kind of a striped look.
Encrypted, I use a program called AX Crypt, which is just available for free download.
It's been out there a few years.
And you can see this, you can't really tell the difference between the JPEG, but this
you can.
It's very, very flat.
And that gets flatter as the file gets larger.
All right.
So file type identification.
That's kind of the overview of some of the things that the tools that you would be looking
for when you use them.
So here's this one.
Can you tell?
Compressed or encrypted?
Just by looking at the picture of the file.
Not really.
But from the histogram and especially the entropy value, this is the entropy calculation
over here, it's easy to tell.
Entropy 7.99997 for the encrypted file.
So unless the files are pretty small, you can use this to distinguish between compressed
and encrypted.
And even if they're fairly small, the entropy for the encrypted will go down, but the entropy
for the compressed will go down further.
Packed or not packed.
So here's an executable.
Is it packed?
You can't tell by looking at a hex editor.
But here it's looking pretty smooth to me.
Still has a large number of zeros, which probably throws the entropy down a little.
But this looks fairly uniform there.
So I'm going to say that's more than likely packed, unless it's just full of compressed
data.
Maybe you have an executable that's just full of a bunch of JPEGs as resources.
Packed or not packed.
Quite a difference, right?
Now this is thrown off a little bit because of the large number of zeros because all this
has to be scaled.
It's just as spiky.
But you can see the different patterns going throughout there.
I used this to examine a ROM one time.
And there was one area.
That was just.
All white.
And that was the area.
That was the RAM on the firmware.
It was like a firmware download, whatever.
And that was the RAM on the firmware where it's blank.
Here's the zoomed in histogram with the zeros kind of going off the scale.
And now you can see it looks kind of like an executable.
However, you can see a little bit of uniformity down there, kind of in the bottom.
So I would say that maybe this has some packed data in it.
But the whole thing isn't packed up.
All right.
So histograms and entropy aren't always effective.
This is the full color bit map that you saw earlier in black and white.
You see how it's fairly smooth.
And let's see if we're hiding something.
Data appended to the end of the file.
Statistics don't really tell you a lot about it.
However, if you look at the histogram, you can see that.
That's kind of unusual for a 24 bit map to have these kind of spikes in there.
And some of that just comes from experience.
I've done this.
I've done this on hundreds and hundreds of bit maps and looked at them over the years
and preparing several other talks and so forth.
Here is the bit map shown, the picture of it, and then you can see some data hiding
at the end.
Okay.
Because that's got a different characteristic there.
So that can reveal something.
Are we using steganography?
LSB steganography hides the least significant bit.
Very difficult to see if the number of bits is less than four.
There's some cover images where you can see them.
But others where you can't.
And sometimes even at four bits in a normal picture you can't even tell.
Five bits is when you can really start to tell.
So what about with the histogram?
Well, of course.
Otherwise I wouldn't bring it up, right?
All right.
So here is Honeybee, the original.
You can see a fairly smooth histogram there.
Entropy 7.55.
And then we go to one bit of randomized data.
Tough to tell on that one.
Right?
It's not.
That would not raise my alert flag there looking at that histogram.
It's a little spiky but not too much.
Now we go to two bits.
Three bits.
Four bits.
It's getting easy to tell.
The picture, however, can you tell by the picture?
I think on this particular image the background is a little blurred.
So you've got some smoothness.
So you can actually tell in the picture a little bit.
Go back to three.
And look at the green background.
And then go to four.
And you can see little bits of discoloration there.
But in the foreground where there's lots of detail, you don't really see that.
However, the histogram is clear.
That is not ‑‑ this is not a histogram of a 24‑bit bitmap.
And neither is this one.
And that one ‑‑ that one would raise my suspicion.
So with two bits.
And let's see if we have five on here.
Oh, yeah.
Five.
I mean ‑‑ then it becomes obvious at that point.
Right, right.
So you can even tell by the picture.
Even those that didn't know what a honeybee looked like would probably think of that.
And then six bits.
And then seven bits.
Anyone want to guess what kind of data we're hiding?
Obviously it's kind of randomized data, right?
Because it's very flat over here.
If we were hiding text data, we'd get kind of a text look to the histogram at this point,
seven bits.
And then eight bits.
That's ‑‑ now you don't have a bitmap at all.
Right?
How about JPEG?
Does this work?
Well, here's my favorite pet, Mandy.
She actually looks kind of annoyed.
I mean, like, why are you taking this picture?
And that's the histogram of the JPEG.
See entropy is fairly high, but it's not like an encrypted file.
Here's Mandy with 146,256 bytes of hidden data.
Okay.
And she still looks annoyed.
But you can't really tell.
I can't tell.
I can't tell even if I flip between the two of those.
The entropy is a bit higher, though.
7.97.
Okay.
So that gets a little higher.
But still, you might find JPEGs with that much entropy.
Well, how about an image of the JPEG?
That doesn't work.
However, if we decompose the JPEG into its DCT coefficients ‑‑
And then take a histogram of that, which is where we're hiding, then it's quite obvious
that on this side, it's very matching, which is normal for a JPEG.
They don't match exactly.
I know these look like they're exact.
But if you look at the raw numbers, they're not exact.
But they're generally close.
This is going to be like a, let's see, plus 1, 2, 3, 4, 5, 6, 7, just like a plus 8 and
minus 8 or something.
But generally the plus coefficients on the lower ‑‑ you know, the lower values match
the minus coefficients.
So 1 and negative 1, 2 and negative 2 and so forth.
But when you start messing with those and hiding, they don't match anymore.
Okay.
And it's easy to figure out why.
Because if you are hiding something in a negative 1 and you change the least significant
bit, what does that number become?
.
No.
No.
If you have a negative 1 and you change the least significant bit ‑‑
Negative 2.
Negative 2.
Right.
If you have a 1 and you change the least significant bit, it becomes a 0.
So even if you change 1 and negative 1 evenly, they're off balance.
Okay.
So that one's a little bit ‑‑ takes a little more work to hide.
There are some stego programs that try to balance those out.
The one you have will produce a histogram like this.
Doesn't try to balance them out.
.
No.
No.
No.
I meant the tool that is ‑‑ yeah.
The tool that I provided on the DEF CON CD for hiding in JPEG will do the hiding that
you've seen here, but it doesn't do anything to balance the DCT histograms.
Okay.
Wow.
We might actually have time for demos.
I didn't think I'd get through it this fast because there's so many slides.
People said you're not going to get through 75, 80 slides in 45 minutes.
Oh, there's lots of pictures.
Maybe you guys are just smart, right?
Am I going too fast?
Okay.
All right.
Well, let me try some demos.
All right.
So reversing XOR.
I had to put more about this if I didn't know and I got through it quite this fast.
Something XORed with itself is zero, so you have to understand how XOR works.
Something XORed with zero will be itself.
I don't think I wrote it up here, but the reason XOR is so popular in cryptography is
because when you XOR something with a key, you get ciphertext, and then when you XOR that
ciphertext with the same key, you get back the original.
So XORing it twice retrieves the original, okay?
Now notice this.
This is kind of an interesting property of XOR because a lot of malware will use XOR,
just a basic XOR encryption to kind of hide stuff.
Something XORed with a space will just change the case of a letter, okay?
So if you have an uppercase letter and you XOR it with hex two zero, it becomes a lowercase
letter.
If you have a lowercase letter, XORed with hex two zero becomes an uppercase letter.
Okay.
And, of course, that's typically the most common character in the English file, in the
English language.
XORing with a single character doesn't even change the entropy.
Okay.
Like if you just use one key XOR and XOR the whole thing with that same key, the entropy
stays the same.
It just gets shifted a little bit.
All right.
So here is like a text file that's been XORed with some character.
So you get the kind of same characteristic spikes kind of grouped together.
You get a brighter visual.
Because now all these values are upper side of the bit values instead of the lower side.
But it still kind of looks like a permuted English text histogram.
So that can be revealed.
So this kind of looks like an executable.
The image does.
All right.
It looks like this is fairly uniform in here.
So entropy 7.2 suggests some type of compression or encryption or maybe weak encryption.
You know, that's another thing I should point out.
The encryption, in order for it to have the 7.999 entropy, it has to be good encryption.
If you use weak encryption, then you get the same effect as if it were compressed.
So I discovered that once.
A client brought us some stuff and said, here, tell us what you can tell about these network
packets.
I said, okay.
So we did a lot of examination on it.
But one of the things I came back with, I said, well, it looks like you're using some
kind of weak encryption.
He's like, how do you know?
Like this?
Oh.
Okay.
So knowing that the first two bytes in an executable is MZ and that zero is prevalent
can also help you a little bit with that.
So in the target file, we found that two bytes were DNN.
Looking at the textual histogram, we found C, A, N, and D were much more prevalent than
others.
So we can start guessing one of those might be a space, one of those might be an E, one
of those might be a T.
And I kind of just did that.
I did some hand waving here because I didn't think I would even get to these slides.
So there's a little bit more to it than that.
But the point is that you can use the entropy and the histograms and the visual tool to
help reverse XOR encryption.
Statistical analyzer.
This one takes the footpad program and combines it with the histogram tool to automate the
analysis.
So you set it loose on a directory.
And it will iterate through all the subdirectories.
And it will run like ten different statistics, not just entropy.
It will create histograms.
It will create bitmap images if you want.
And then it will compare it to a baseline.
So it's one of those kind of training type phase programs, compare it to a baseline.
And then it will pop out and spit out any anomalies you have.
This one says it's a JPEG, but it doesn't look like a JPEG.
It has low entropy.
Or this one says it's text, but it's got high entropy, that kind of thing.
I have presented on that particular tool before.
A few years ago.
And so I didn't obviously include the whole presentation on that here.
But if anybody is more interested, my contact information is at the end of the slide presentation.
All right.
I hope you learned something useful.
Looks like we do have some time.
So I can do a few demos for you.
There's my contact information.
And you can e‑mail me if you want.
Here's some blogs at Harris that are relevant.
This one is written by someone else.
So if you're interested in that, there you go.
And then this one is what I added to it.
And then here's some irrelevant blogs at Harris that I wrote.
That's actually a serious article.
It's not, you know, anything ‑‑ they wouldn't let me publish one.
That was bad.
Right?
All right.
And I do want to thank Mr. Greg Conti.
He's presented at Black Hat before.
And he kind of gave me the idea back in 2005 of the whole visualization concept, and that's
where a lot of this stuff was born from.
All right.
Does anybody want to see some demos?
Yes.
All right.
Let me find my screen again here.
Okay.
All right.
So what do you want to see?
Do you want to see, like, the Steg program?
Do you want to see ‑‑ what demos?
Any preferences there?
No.
I know.
It's ‑‑ okay.
There we go.
Yeah.
It's on the screen.
All right.
Let me find the ‑‑ my favorites?
All right.
Let's see what we have.
All right.
Good.
Hang on.
My favorites are the Stego tools, really.
So I like to do those, as well.
All right.
So I'll pull out the ‑‑ I'll pull out ‑‑ and where is that?
Steg JPEG.
Okay.
All right.
And then I just need a little media file here, find some JPEGs here.
I'll try to pull out some more of the interesting ones.
All right.
Oh, yeah.
This is one of my favorite JPEGs to hide in.
I actually had heard on the news about this ‑‑ I heard about this device sold by a particular
company, which I will not mention, that was supposed to detect, like, porn on your computer
or something.
So I decided to ‑‑ see if I can move this one over there.
Okay.
It doesn't want to ‑‑ okay.
Let's see.
I have to figure out how to ‑‑ I'm sorry.
I apologize.
Where is my mouse?
Do you want to duplicate?
Yeah, we can duplicate.
Let's do that.
That will work better.
Thank you.
Appreciate it.
Okay.
So this is supposed to ‑‑ I put this thing in there.
On my laptop.
And it divided the pictures into three categories, like suspicious, highly suspicious or not
suspicious.
This was the one that was the most highly suspicious.
Now, if I find that, that's not what I'm looking for.
All right.
Let me find my thing here.
Okay.
I need to just get to this directory.
And.
Okay.
So here is ‑‑ let me change that prompt, too, so it's nice and short.
So here's the Steg JPEG one.
Okay.
Just show you that in action.
And it has a number of different features to play with.
I mean, it can just take randomized input and create it from a pseudorandom number generator.
And so you can add that.
There's several parameters here.
Typically just keep A and U.
Like 4 to 8.
And quality is pretty high.
That will get you the best file hiding.
And we're going to ‑‑ let's see.
We'll try to hide this one in there.
75 K and 209 K. All right.
So we'll try to hide the flower in the baboon there.
So do that.
Tell it dash hide.
And then we need a cover file, which is the JPEG.
Oh, and it will take either bit map or JPEG as a cover file.
And then it will convert that to a JPEG on the output.
And then the message file is a flower.
It may not fit, but we'll give it a try here.
100 quality.
And I'll just go for the max.
On what?
The message file can be any arbitrary file.
Anything.
It doesn't care.
It reads it as a stream of bits and hides it.
Okay.
Any other questions?
Please come to the microphone.
Go ahead.
I'm sorry.
Yeah.
The question was what kind of files can you hide?
Does it have to be a picture file?
No.
It can hide any arbitrary file.
And the steg LSB program, that's the same way.
It can hide any arbitrary file.
You have to give it the dash LSB option, okay, which is in the usage.
Because it has a special demo mode.
All right.
So here's what it said.
It said our storage capacity was about 146,000 bytes.
The message size was 17,000 bytes.
75,966.
Okay.
So we can look at the resulting file here.
Can't tell anything.
Isn't that a pretty baboon?
I took that picture in Africa.
All right.
So you can't see anything.
Now, of course, no steganography is complete without extraction, right?
Because you can always say, yeah, it's hidden in there, right?
So let's see what we can do with that.
So we got a stego file, and that's going to be the hid file.
And then it should pick up the quality okay, but these parameters, the dash A and the
dash U, have to match.
And then let's see ‑‑ oh, now I need to tell it to extract.
The command line is very archaic.
This would be much better with a GUI.
So if anybody likes developing Windows GUIs and they want to develop one, great, send
it to me.
I appreciate it.
Okay.
So that's a good sign, right, if I'm the same message size.
Okay.
And let's see.
Where is it?
And it extracted that to this.
Now, how did it know the file name?
Well, I have to put in, in addition to the file data, I put in the size, and I went ahead
and stored a null terminated file name.
So the first four bytes of the size, then the null terminated file name, and then the
rest of the data.
And I'm just going to add a .JPEG extension on to the end of this one.
And you'll be able to see how I do.
And there's the flower picture.
.
You know what?
I should have shown you the original, right?
There's the original.
You're all like, yeah, it works.
I could have extracted anything.
All right.
Now let's do the WBH thing here.
There is WBH, right?
And then this one is a simple command line tool also.
So we'll put the HID JPEG.
big file there. And the dash B option creates the image of it. So there it is. By the way,
this bit entropy, I tried that out with zeros and ones. It doesn't work. It doesn't tell
anything. Yes. It's exactly the same byte for byte. Okay.
Exactly the same. Yep. Thank you. The question was, does the recovered file have the exact
checksum as the original? And the answer is yes, because the original file is stored
in there exactly byte for byte. There's no loss. Thank you. So the bit entropy here,
that was kind of experimental. That didn't work out too well. But we see 7.97 with the
hidden data. So we've taken compressed data and hidden it in another compressed file.
We can look at the textual histogram, just for grins there. So you can kind of scroll
down. And you can see the exact checksum as the original. And the answer is yes, because
it's the exact counts and the exact distribution and so forth of that. Okay. And then the ‑‑
where is it? The histogram, the bit map histogram. Okay. It's very uniform here except for a
lot of zeros. If you want to ‑‑ we can use a tool to get a closer look at this. So
like I will ‑‑ I'll use what's called the Zoom feature. And I'll run the same thing
again. Except this time I'll use dash Z and 5. Okay.
And now we have one ‑‑ let me stretch this out a little bit. With a Z5, that's what
that is. I over‑zoomed a little bit. Let's try 3. So very easily I can do 3. And then
go to the Zoom 3. And now you can kind of get a closer look at that area. In fact, that
would be actually a good one to use on an executable, right? Because it's not quite as uniform.
So we can do WBH of itself.
All right. And then take a look at that one. And here's the histogram of WBH at Zoom 1.
So 0 and 255 kind of mask out some of those. So we can use the Zoom feature. And dash Z
and 3. And now you can see that a little closer. Okay.
So the image ‑‑ where did we put that one? That one was the baboon ‑‑ this
one here. That's the image of the JPEG with the hidden data. Can't really tell anything
from that. So ‑‑
Can you show a picture where we can actually see the flower in the bathroom?
It is with a bit map where you see the flower behind it or whatever. I have time. We have
the ‑‑
It's hard technology.
So these two folders here, by the way, are on your disk. So there's some tools that I
just included that you can also download for free just like I did. So here's the Steg
LSB one. So we'll take that back over to the demo area here. Let me clear out some of these
other things. Just to make room. Okay. So now I have Steg LSB in there. Now all I need
are some bitmaps to let me grab some of those. Okay. Let's see here. Oh, there we go. All
right. Now the only thing is for this picture in
picture staganography is it's really not useful. I mean, as far as ‑‑ this is for playing
with. Right? Because everything has to be the exact same size. So that ‑‑ I'm going
to take the upper four bits of one picture and stuff it in the bottom four bits of the
other picture.
So let's see here, we have that one is 36, 1711, I don't know, let's see, I have to find
the right media file, 768 by 512, nope, I may not have the right size there.
These are all different sizes apparently.
Well I might have to just do LSB on that one for now.
I'll show you how to do it, but it's going to come back and tell me that the files aren't
the right matching size, just trying to think where ‑‑ I know I have files on here
somewhere that ‑‑ I'm going to try on my backup disk here ‑‑ yeah, I could
do that.
But it has to match exactly.
So I'm sure that I have one.
Nope.
Maybe not.
I really didn't expect to get done as fast as I did, but I appreciate your patience
with this.
I went to the wrong ‑‑ that's going to be the same thing as what I have here.
Well let me just show, since we've just got a couple minutes, let me just show the files
and then we can meet later on.
You had a question?
Yeah.
Okay.
So when somebody ‑‑ neither the sender nor the receiver has messed with the calling,
like to make thumbnails or something, I realize that you wouldn't be able to get the original
hidden data, but would you be able to see that something actually didn't get the original
unmessed with?
Well, if it was unmessed with, it wouldn't have anything hidden.
No, no, no, no.
Okay.
I want to hide something.
Send it.
Oh, no, if you ‑‑ the question is if I do a transcoding, like when you post stuff
to Facebook and Flickr, a lot of times they'll change it for you and change the quality or
whatever or shrink it. You can't recover your data, but can you tell something was hidden?
No, because it just ‑‑ all the coefficients get all scrambled up again and made of new.
Okay. Yeah. You have a question? Yes.
Excuse me, have you ever been calling ‑‑ I have.
Sure. What's your question?
So in your example ‑‑
Yeah. You used two images that had no relationship
to each other. Let's say that we're talking about somebody who ‑‑ like a journalist
in a conflict area and they have a picture that is like the street before the demonstration
and the street after. So there's a lot of overlaps. Is it possible to use stenography
to essentially reuse the cover image as part of the stenography so you just kind of encode
the deltas?
Yes.
And so you can essentially take advantage of the original image where there are exact
or close to similarities?
You can certainly encode a delta. There might not be any reason to. When you take two different
pictures, the way the camera is going to encode it, the way the light hits, everything
is going to change. You're going to have very high probability of having very likely completely
different images.
All right.
Mathematically you're going to have very different images anyway. So you can hide
both of those in two different images or hide them both in the same image and you can't
really tell that they're related in any way.
Okay. So you can't take advantage of any common runs of the similar ‑‑
In a JPEG there won't be much common. In a bitmap ‑‑
If they were both recorded as bitmaps, then you would probably be able to find some commonality.
But I did not cover in this presentation the math behind the JPEG, but it's easy once you
know how to use it.
But it looks very complex.
You multiply that summation by cosines and stuff and it gets to be very complex.
And so everything is interdependent.
You have basically an 8 by 8 matrix and you run it all through a bunch of math at different
frequencies.
It's kind of like a frequency correlation of your image.
And so you just change one little value in there and it's all interdependent.
It goes this way and this way and all that.
So it gets to be completely different when you have JPEG.
Bitmap, yeah, there could be a lot of similarity.
You got a blue sky.
It's very similar.
But with JPEG, if you're off by one, it's not ‑‑ Yes?
You were talking about the tool running it from the directory.
So this file ‑‑ I doubt this file is the JPEG file.
I doubt this file.
Can we see that?
I don't know.
Well, we're out of time for today.
And I don't have that tool on the disk.
It's a little bit more complex to use.
You have to take a baseline first and stuff.
But if you e‑mail me, we'll see ‑‑ I'll see about how I can get it to you or whatever.
I don't have a problem with giving it out.
It's just fairly complex to use.
Yeah?
Can you put your contact information in there?
Oh, sure.
I can do that.
It's fairly easy.
It's just stego at satx.rr.com.
Okay.
You do have time for one more question.
Okay.
You mentioned wanting a GUI.
Is this open source?
You have on the disk some of the source code.
I honestly don't remember exactly which source code I put on there.
I know I put the wrapper source code and the WBH.
They're all written by me.
But if the source code is not there to some of these programs and you want it, I'll give
it to you.
Cool.
Just e‑mail me.
Yep.
All right.
Well, thank you very much.
I appreciate it.
Thank you.
